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Abstract 

In this short note, we analyze the assumptions made by McDougal 
et al. PP, both explicit and implicit, in their estimation of the pro- 
portion of "true recent infections" using the BED CEIA. This enables 
us to write down expressions for the sensitivity, short term speci- 
ficity and long term specificity of a test for recent infection defined 
by a BED ODn below a threshold. We then derive an identity which 
shows the relationship between these parameters, allowing the elim- 
ination of sensitivity and short term specificity from an expression 
relating the proportion of "true recent infections" to the proportion 
of seropositive individuals testing below threshold. This has two im- 
portant consequences. Firstly, the simplified formula is substantially 
more amenable to calibration. Secondly, naively treating the parame- 
ters as independent would lead to an incorrect estimate of uncertainty 
due to imperfect calibration. 
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Elimination of Parameters 



In the model proposed by McDougal et al. [lj, a BED ODn below some 
threshold, for a seropositive individual, is declared to be an imperfect test for 
recent infection. They derive an estimate for the true proportion of recent 
infections (P t ) in terms of the proportion of seropositive individuals that 
register under the threshold (P ), a sensitivity (a), a short term specificity 
(pi) and a long term specificity (p2)- Knowledge of P t allows the calculation 
of the 'recent infections in 1 year per number at risk' in a hypothetical cohort. 
The present note concerns the correct calculation of P t , but does not address 
the issue of calculating a risk of infection using this proportion. 

McDougal et al. estimate a window period, being 'the mean period of 
time from initial seroconversion to reaching an ODn of 0.8'. Presumably non- 
progressors — those not reaching the threshold — are censored. More specifi- 
cally this implies that the window period is the mean threshold crossing time 
conditional on progression (i.e. actually reaching the threshold). Sensitivity 
of the test is calculated for an interval corresponding to the window period. 
Short term specificity is calculated for 'the interval immediately after, and 
equal in duration to, the window period'. Long term specificity is for 'the pe- 
riod thereafter (where the curve is flat)'. The 'curve' being referred to here is 
the survival function (for the calibration sample) in the state of being under 
the threshold, conditional on being alive, which we denote S v]A (t). McDougal 
et al. explicitly make the following assumptions, with the justification that 
they 'are reasonable as very little attrition (from death) during the first two 
time intervals after infection would be expected': 

1. 'Recent infections are randomly distributed within the first window 
period'. 

2. 'The number of persons in the interval of equal duration immediately 
after the mean window period equals the number in the first window 
period'. 

3. 'The remainder of the population is more than two window periods 
since seroconversion'. 

While it may be true in the situation being explored here, we note that 
it is not a priori obvious that the choice of equal window periods ensures 
that S V \ A (t) is flat after twice the window period. With this in mind, we 
propose a generalization in which the two window periods be allowed to have 
arbitrary values u)\ and 002, as long as all individuals that progress do so in a 
time less than u)\ +uj 2 after seroconversion (i.e. S v]A (t) is flat for t > uji + u 2 , 



2 




Figure 1: The six sector model of McDougal et al., showing counts (above) 
and the survival function S v]A (t) (below) versus time since infection. 

see the bottom graph of Figure 1). For analytical convenience, we introduce 
5'pu|a(^) ! the survival in the state of being under threshold for progressors 
(i.e. those who reach the threshold). We also introduce P NP , the fraction of 
individuals that fail to progress. Then S vlA (t), S PVlA (t) and P NP are related 
by 

Su|a(^) = (1 — IPnp)>Spu|a(£) + IPnp- 

The introduction of S PUlA (t) allows us to provide a precise definition of the 
window period used by McDougal et al., being the mean period of time from 
initial seroconversion to reaching ODn spent by those who progress: 

POO 

u := E [t PU | A ] = / 5pu| A (t) dt, 
Jo 

which follows from integration by parts on the relevant density function. 

Assumption 1 above means that infection times in the first window period 
are uniformly distributed. Although assumption 2 merely states that the 
number of infections in the second window period is equal to the number 
in the first, we shall see later that this is not strong enough to produce a 
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definition of p\ which is independent of the state of the sample. It is necessary 
to specify the stronger assumption that the infection events in the second 
window period are also uniformly distributed with the same intensity as in 
the first window period. We see below that this assumption is implicit in the 
work of McDougal et al. To make this more tangible, we denote the density 
of infection times of individuals in the sample by f{t), with the number of 
seropositive individuals given by N sp = J Q °° f(t)dt. Then in our model of 
general window periods f(t) = fo for all t G [0, U\ + U2), and it follows that 
the ratio of infected people in the second window period to those in the 
first period is uiz/ui. It should be noted that f(t) depends on incidence, 
susceptible population and life expectancies over the history of the epidemic. 
With reference to Figure [H we are now in a position to write expressions for 
the number of seropositive individuals in each sector: 

m= / f(t)(i-S v]A (t))dt 







= /o(1-Pnp) / (l-S PUlA (t))dt 
Jo 

n 2 = / f(t)S v]A (t)dt 



= iWiP NP + foil - P NP ) / S PV{A (t) dt 

Jo 

n 3 = / f(t)(l-S v[A (t))dt 

= /o(l - F NP ) / (1 - 5 PU|A (t)) dt 

/•L01+LU2 

n 4 = / f{t)S vlA {t)dt 

= /ow 2 Pnp + /o(l - Pnp) / 5 PU , A (t) dt 



0J1 



n 5 = / f(t)(l-S vlA (t))dt 

J UJ2 

POO 

= (l-p N p) / f(t)dt 

J u>2 

POO 

n 6 = f(t)S v]A (t)dt 

J UJ2 

POO 

= p N p / f(t) dt. 
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Using the above expressions, the sensitivity, the short-term specificity and 
the long-term specificity are given by 



n 2 _ (1 - P NP ) J^ 1 S PVlA (t) dt + wiP; 



ni + n 2 0Ji 

(i-v NP ) 1:^(1 -s PVlA (t))dt 

Pl n 3 + n 4 uj 2 

fit; 

P2 = ^— = 1 " Pnp- 

We can now see why the assumption of uniform distribution of infection 
events for the first and second window periods is required — it is the only way 
in which we can get a cancelation of f(t) in the expressions for a and p±. 

We also see why it is necessary that S ulA (t) must be flat after both window 
periods — it ensures that the S v]A (t) is constant and can be pulled out of the 
integrals in n 5 and n 6 as the factor P NP . This is necessary for p 2 to be 
independent of f(t). 

Furthermore, we now show that in order to specify p 2 so that it is inde- 
pendent of the state of the epidemic, an implicit assumption is being made 
that survival is the same for progressors and non-progressors. Under bias-free 
recruitment into the survey, we have 

/(f) = ^H(-t)I(-t)S A (t), 

-L sp 

where H(t) is the number of healthy (susceptible) individuals, I(t) is the 
instantaneous incidence, S A (t) is the life-expectancy survival function mea- 
sured from the time since infection and 



POO 

T sp = / H{-t)I{-t)S A {t)dt 
Jo 



is the total number of seropositive individuals alive in the population at 
t — 0. The ratio iV S p/T S p is just the fraction of the total population that has 
been recruited into the survey. Now, note that f(t) is used symmetrically 
in the expressions for and n 6 . If different life expectancies were used in 
these formulae, reflecting a difference in survival for progressors and non- 
progressors, the fs in these formulae would need to be different, and would 
not cancel in the expression for p 2 . This assumption is not explicitly stated 
by McDougal et al. but is implicit in arriving at a p 2 that is independent of 
epidemic state. 

With the calibration parameters specified precisely, we now derive an 
estimate for the proportion of seropositive people P t who were infected at 
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a time less than u\ before the present — these are the true recent infections. 
A generalization of equation (1) in McDougal et al., relating the proportion 
of true recent infections to the observed proportion of the population that 
tested recent, is given by 

P Q = P t a + P t ^(l - Pl ) +(l-P t - P t ^) (1 - p 2 ). 

L)\ \ iO\ J 

This means that we can solve for the true proportion of recently infected 
individuals to get 

Pt = p ° + /'-\ . (i) 



"-S?P1+(1 + S)P2-1' 



Note that this equation reduces to the one derived by McDougal et al. when 
one sets cu 1 = oj 2 

P t = Po + P2 " 1 ■ (2) 
a - Pl + 2p 2 - 1 V ; 

Now, for completeness, we provide the precise assumptions that are re- 
quired in order to facilitate the analysis in the rest of this paper. We note 
that with the exception of arbitrary sized window periods, these assumptions 
are equivalent to the assumptions — either explicit or implicit — that are being 
made by McDougal et al. 

Model Assumptions. Specify window periods uj\ andio 2 . We assume that: 

1. The window periods are chosen so that the survival function S v]A (t) is 
flat after t = uj x + u 2 . This means that S PV]A (t) only has support on 
the time interval t 6 [0, lo\ + lo 2 \ ■ 

2. Arrival times of infection events are uniformly distributed on the inter- 
val [0, LO\ + uj 2 \. 

3. Survival is symmetric for progressing and non-progressing individuals. 

We are now able to provide an important identity that is not anticipated 
in McDougal et al. 

Proposition 1. Under the model assumptions stated above, the following 
identity holds: 

U) 2 ( u 2 uj\ 

o- pi + H ) p 2 = 1. 
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Proof. Since we assume that S PU[A (t) only has support on t e [0, u)\ + lu 2 ] 
we have 

/ S PU{A (t) dt = S PV]A (t) dt = E [r PU | A ] = u. 
Jo Jo 

Then, simply evaluating 

uj 2 (1 - P NP ) ^ S PVlA {t) dt + ^P NP 
a pi - 



co 2 ^-^)i:: +uj2 ^-s PV At))dt 

L0 X 0J 2 

(1 - p NP ) /; i+ " 2 g PU|A (t) dt - £^(1 - p NP ) dt + uj>* 

(1 - P NP )(CJ -L0 2 - UJi) + UJi 
1 + P2, 

yields the result directly. 

Using the Proposition, equation (CEJ) simplifies to 

to P Q + p 2 -1 



Pt 



UJi p 2 



This expression no longer relies on estimates for a and p\. It is also interesting 
to note that it does not depend on uj 2 . Furthermore, if we set uj\ = uj as in 
McDougal et al. then we get 

p = ^±£1^1, (3) 

P2 



Discussion 



Note that ([2]) as stated in McDougal et al. contains three calibration param- 
eters (a, pi and p 2 ), while ([3]) contains only one calibration parameter (p 2 ). 
Incidence estimates using ([3]) would still, however, require the estimation of 
uj. The method of McDougal et al. can in principle be applied to an arbitrar- 
ily declared window period, as long as a, p\ and p 2 are calibrated for that 
value. We have therefore reduced the number of calibration parameters by 
one. 
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Estimation of extra parameters may unnecessarily dilute the statistical 
power of the calibration data at hand. Moreover, estimates of the uncertainty 
due to calibration, based on the assumption of the independence of a, p\ and 
p 2 , will produce incorrect results. Note that when one sets oj\ = uj 2 — uj, 
that the identity is given by 

a - p x + P2 = 1- 

Substituting the values for the calibrated parameters found by McDougal et 
al., namely a = 0.768, p\ = 0.723 and p 2 = 0.944, into this equation gives 
a value for the left hand side of the equation equal to 0.989. This confirms 
that their calibration was reasonably accurate. 

Perhaps the most important advantage of eliminating a and p\ is that the 
remaining parameters are more amenable to calibration. The calibration of a 
and pi requires obtaining specimens from individuals with confidence about 
their time since infection (i.e. using frequent follow-up). On the other hand 
both p 2 and uj can be estimated through follow-up intervals greater than 
uji + uj 2 . The estimate for p 2 comes from the proportion of under-threshold 
samples known to be obtained more than uj\ + uj 2 since infection (i.e. second 
seropositive samples). Given an estimate for p 2 , uj can be estimated from 
the fraction of individuals who test under-threshold on the first seropositive 
sample. 

We have suggested an alternative incidence estimation paradigm [2] which 
requires fewer assumptions than the method of McDougal et al. In this 
approach P NP = 1 — p 2 and uj emerge as the natural calibration parameters. 
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